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Abstract 



o- 

(N ; 

^ i This note shows that we can recover any complex vector x 6 C" exactly from on the order of n 

' quadratic equations of the form |(aj, xq}\ 2 = bi, i = 1, . . . , m, by using a semidefinite program known as 

C/3 . PhaseLift. This improves upon earlier bounds in [3] , which required the number of equations to be at least 

[ — ■ on the order of nlogn. Further, we show that exact recovery holds for all input vectors simultaneously, 

t-H , and also demonstrate optimal recovery results from noisy quadratic measurements; these results are 
much sharper than previously known results. 

H : 
I— h ; 

1/3 '■ 

, O; 1 Introduction 

CN 11 Suppose we wish to solve quadratic equations of the form 

>: 

\(ai,x }\ 2 = bi, i = l,...,m, (1.1) 

(N; 

^sO \ where Xq E C n is unknown and G C n and 6, G 1 are given. This is a fundamental problem which 
00 ' includes all phase retrieval problems in which one cannot measure the phase of the linear measurements 
(oi,x), only their magnitude. Recently, [2, 3] proposed finding solutions to (1.1) by convex programming 
techniques. The idea can be explained rather simply: lift the problem in higher dimensions and write 
L" " X = xx* so that (1.1) can be formulated as 

'X ; find X 



subject to X y 0, rank(X) = 1, 

tr(a,;a*X) = bi, i = 1, . . . m. 

Then approximate this combinatorially hard problem by using a convex surrogate for the nonconvex rank 
functional: PhaseLift [2, 3] is the relaxation 

minimize tr(_X") 

subject to X y 0, (1.2) 
tr(a,;a*X) = bi, i = 1, . . . , m. 

The main result in [3] states that if the equations are sufficiently randomized and their number m is at 
least on the order of nlogn, then the solution to the convex relaxation (1.3) is exact. 
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Theorem 1.1 ([3]) Fix Xq £ C n arbitrarily and suppose that 

m > Co nlogn, (1-3) 

where cq is a sufficiently large constant. Then in all models introduced below, PhaseLift is exact with 
probability at least 1 — 3e _7 ~ (7 is a positive numerical constant) in the sense that (1.3) has a unique 
solution equal to XqXq. 1 

The models above are either complex or real depending upon whether xq is complex or real valued. In all 
cases the a^'s are independently and identically distributed with the following distributions: 

• Complex models. The uniform distribution on the complex sphere of radius yfn, or the complex 
normal distribution jV(0, I n /2) + W(0, J„/2). 

• Real models. The uniform distribution on the sphere of radius \/n, or the normal distribution 
AT(0,/„). 

Clearly, one needs at least on the order of n equations to have a well posed problem, namely, a unique 
solution to (1.1). 2 This raises natural questions: 

Does the convex relaxation (1.3) with a number of equations on the order of the number of unknowns 
succeed? Or is the lower bound (1.3) sharp? 

Is it possible to improve the guaranteed probability of success? 

Can we hope for a universal result stating that once the vectors ai have been selected, all input signals Xq 
can be recovered? 

This paper answers these questions. 

Theorem 1.2 Consider the setup of Theorem 1.1. Then for all xq in C n orM. n , the solution to PhaseLift 
is exact with probability at least 1 — 0(e _7m ) if the number of equations obeys 

m > con, (1-4) 

where cq is a sufficiently large constant. Thus, exact recovery holds simultaneously over all input signals. 

In words, (1) the solution to most systems of quadratic equations can be obtained by semidefinite pro- 
gramming as long as the number of equations is at least a constant times the number of unknowns; (2) 
the probability of failure is exponentially small in the number of measurements, a significant sharpening of 
Theorem 1.1; (3) these properties hold universally as explained above. 

Letting A : C nxn -)• R m be the linear map A{X) = {tr(aia*X)}i<;< m , Theorem 1.1 states that with 
high probability, the null space of A is tangent to the positive semidefinite (PSD) cone {X : X y 0} at 
a fixed xq £ C n . In constrast, Theorem 1.2 asserts that this nullspace is tangent to the PSD cone at all 
rank-one elements. Mathematically, what makes this possible is the sharpening of the probability bounds; 
that is to say, the fact that for a fixed xq, recovery holds with probability at least 1 — 0(e _7m ). Importantly, 
this improvement cannot be obtained from the proof of Theorem 1.1. For instance, the argument in [3] 
does not allow removing the logarithmic factor in the number of equations; consequently, although the 
general organization of our proof is similar to that in [3], a different argument is needed. 

In most applications of interest, we do not have noiseless data but rather observations of the form 

k = \{a,i,xo)\ 2 + Wi, i = l,...,m, (1.5) 

1 Upon retrieving X — xqXq, a simple factorization recovers xq up to global phase, i.e. multiplication by a complex scalar 
of unit magnitude. 

2 The work in [1] shows that with probability one, m = An — 2 randomized equations as in Theorem 1.1 are sufficient for 
the intractable phase retrieval problem (1.1) to have a unique solution. 
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where Wi is a noise term. Here, we suggest recovering the signal by solving 

minimize Ei<;< m I tr(a*a*X) - h\ 

subject to X y 0. 1 ' ; 

The proposal cannot be simpler: find the positive semidefinite matrix X that best fits the observed data 
in an i\ sense. One can then extract the best-rank one approximation to recover the signal. Our second 
result states that this procedure is accurate. 

Theorem 1.3 Consider the setup of Theorem 1.2. Then for all xq G C n , the solution to (1.6) obeys 

I1 1/7 11 1 

X-x x* F <C 1 ^-^ (1.7) 
m 

for some numerical constant Cq. For the Gaussian models, this holds with the same probability as in the 
noiseless case whereas the probability of failure is exponentially small in n in the uniform model. By finding 
the largest eigenvector with largest eigenvalue of X , one can also construct an estimate obeying 

\\x - e**x h < Cominfllxolk ) (1.8) 



m||xo||2 
for some <j) S [0, 2tt] . 

In Section 2.3, we shall explain that these results are optimal and cannot possibly be improved. For now, 
we would like to stress that the bounds (1.7)-(1.8) considerably strengthen those found in [3]. To be sure, 
this reference shows that if the noise w is known to be bounded, i.e. ||tw||2 < s, then a relaxed version of 
(1.3) yields an estimate X obeying 

||X - x Xq\\ 2 f < C e 2 . 
In contrast, since \\w ||i < ^/m||io||2 < y/rne, the new Theorem 1.3 gives 

e 2 

\\X - x xq\\ f < C — ; 

m 

this represents a substantial improvement. 



2 Proofs 

We prove Theorems 1.2 and 1.3 in the real-valued case, the complex case being similar, see [3] for details. 
Next, the Gaussian and uniform models are nearly equivalent: indeed, suppose a,, is uniformly sampled on 
the sphere; if npf ~ Xn an d is independent of o^, then = /jjdj is normally distributed. Hence, 

bi = \(a,i,x )\ 2 + Wi b'i = \(zi,x }\ 2 + p 2 Wi. 

In the noiseless case, we have full equivalence. In the noisy case, we can transfer a bound for Gaussian 
measurements into the same bound for uniform measurements by changing the probability of success ever 
so slightly — as noted in Theorem 1.3. Thus, it suffices to study the real-valued Gaussian case. 

We introduce some notation and with [m] = {1, . . . , m}, we let A : M nxn — > M. m be the linear map 
A(X) = {tr(aja*X)} ig [, m ] whose adjoint is given by A*(y) = X^e[m] Vi a % a *i ■ Note that vectors and 
matrices are boldfaced whereas scalars are not. In the sequel, we let T be the subspace of symmetric 
matrices of the form {X = xXq + xqx* : x G IR n } and T 1 - be its orthogonal complement. For a symmetric 
matrix X, we put Xp for the orthogonal projection of X onto T and likewise for X T ±. Hence, X = 
Xt+X t a_. Finally, \\y\\ p is the£ p norm of a vector y and (resp. ||X||_p) is the spectral (resp. Frobenius) 
norm of a matrix X . 
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2.1 Dual certificates 

We begin by specializing Lemmas 3.1 and 3.2 from [3]. 

Lemma 2.1 ([3]) There is an event E of probability at least 1 — 5e _7om such that on E, any positive 
symmetric matrix obeys 

m^PPOHx < (l + l/8)tr(X), 
and any symmetric rank-2 matrix obeys 

m _1 ||^(X)||i > 0.94(1 - 1/8)||X||. 

The following intermediate result is novel, although we became aware of a similar argument in [4] as we 
finished this paper. 

Lemma 2.2 Suppose there is a matrix Y in the range of A* obeying Y T ± ^ —I T ± and \\Yt\\f — \- Then 
on the event E from Lemma 2.1, Xq = xqXq is PhaseLift's unique feasible point. 

Proof Suppose xqXq + H is feasible, which implies that (1) H T ± >z and (2) H is in the null space of 
A so that {H, Y) = = (H T , Y T ) + (H T ±,Y T ±). On the one hand, 

(H T ,Y T ) = — (H T x , Y T x ) > (H T ±,I T x) = tr{H T ±). 

Lemma 2.1 asserts that m -1 ||>l(JEf T x)||i < (1 + 1/8) tr(iT T ±) and m" 1 p(i? T )||i > 0.94(1 - 1/8)\\H T \\. 
Since A(Ht) = — A(H T ±), this gives 

1 73 

tr(ff T x) > > 0.73||JT T || > -t-II^tIIf, (2.1) 

(1 + l/8)m V2 

where the last inequality is a consequence of the fact that Ht has rank at most 2. On the other hand, 

\{H t ,Yt)\ < \\Ht\\f\\Y t \\f < \\\H t \\f. (2.2) 

bmce 0.73/V2 > 1/2, (2.1) and (2.2) give that Ht = 0, which in turns implies that H T ± = 0. This 
completes the proof. ■ 

To prove Theorem 1.2, it remains to construct a matrix Y obeying the conditions of Lemma 2.2 for all 
xq £ M. n . We proceed in two steps: we first show that for a fixed ajo, one can find Y with high probability, 
and then use this property to show that one can find Y for all xq. 

Lemma 2.3 Fix Xq € W 1 . Then with probability at least 1 — 0(e _7m ), there exists Y obeying \\Y T ± + 
Y5-^T ± II — lb an d \\Yt\\f < Jj- I n addition, one can take Y = A*(X) with ||A||oo < 7/m. 

Proof We assume that [| ar?o ||a = 1 without loss of generality. Our strategy is to show that 

Y = ^ a I-=ll ^2[\{ai,x )\ 2 l(\{ai,x )\ < 3) - /3 ] a,af :=Y^-Y^, (2.3) 

ie[m] i€[m] 

where /3 = Ez 4 t(\z\ < 3) w 2.6728 with z ~ 7V(0, 1), is a valid certificate. As claimed, ||A||t» < 7/m. 

We begin by checking the condition Y T ± ^ —I T ±. First, the matrix yw is Wishart and standard 
results in random matrix theory — e.g. Corollary 5.35 in [5] — assert that 

-EY^\\ = \\Y { V - PqI\\ < /3 /40 

with probability at least 1 — 2e~ 7m provided that m > cn, where c is sufficiently large. In particular, we 
have 

\\Y$-PoIt±\\ < A)/40. (2.4) 
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Second, letting x' be the projection of x onto the orthogonal complement of span (xq), we have 



Y t°? = ^ zZ && T > ^ = (ai,x >l(Koi,aJo>| < 3)oi. 

ie[m] 

It is immediate to check that the £j's are iid copies of a zero- mean, isotropic and sub-Gaussian random 
vector In particular, with z ~ A/"(0, 1), 

E££ T = a / T x, a = Ez 2 l(\z\ < 3) w 0.9707. 

Again, standard results about random matrix with sub-gaussian rows — e.g. Theorem 5.39 in [5] — give 

\\Y$ - E || = \\yj$ - q I T x || < a /40 (2.5) 

with probability at least 1 — 2e _7m provided that m > cn, where c is sufficiently large. Clearly, (2.4) 
together with (2.5) yield the first condition \\Y T ± + y^J T± || < . 

We now establish ||1t||j? < 3/20. To begin with, set y = Yxq and observe that since H^tHf = 
\{y,xo)\ 2 + 2||y'||2, it suffices to verify that 

\(y,x )\ 2 < 1/20 and \\y'f 2 < 1/10. 

Write (y, Xq) = ^ J2ie[m] &> where & are iid copies of £ = z 4 l(|z| < 3) — (3oz 2 , z ~ Af{0, 1). Note that £ is 
a mean-zero sub-exponential random variable since the first term is bounded and the second is a squared 
Gaussian variable. Thus, Bernstein's inequality — e.g. Corollary 5.17 in [5] — gives 



P(|(y,aj )| > 1/V20) < 2exp(- 7 m) 
for some numerical constant 7. Finally, write y' as 

y' = —Z'c, Z' = [a[, . . .,a' m ], cj = {a, h x fl(\(ai, x )\ < 3) - /3 {ai,x Q ), i £ [m\. 
m 

Note that Z' and c are independent. On the one hand, the cf's are iid sub-exponential variables and 
Corollary 5.17 in [5] — gives 

P(||c|||-E||c||l > m) < 2e" 7m 
for some numerical constant 7 > 0. This shows that 

||c||| < (S + l)m, 5 = E(z 3 l(\z\ < 3) - /3 ^) 2 « 4.0663, z ~ JV(0, 1), (2.6) 

with probability at least 1 — 2e~ rm . On the other hand, for a fixed x obeying ||sc||2 = 1, \\Z f x\\2 is 
distributed as a x 2 -random variable with n — 1 degrees of freedom and it follows that 



Z'x||2 > m/52) < e" 7m (2.7) 

for some numerical constant 7 > with the proviso that m > cn and c is sufficiently large. We omit the 
details. To conclude, (2.6) and (2.7) give that with probability at least 1 — 3e _7m , 

Ill/Hi = -^WZ'cWl < (1 + *o)/52 < 1/10. 
This concludes the proof. ■ 
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2.2 Proof of Theorem 1.2 

The proof of Theorem 1.2 is now a consequence of the corollary below. 

Corollary 2.4 With probability at least 1 — 0(e _7m ), for all Xq G M n , there exists Y obeying the conditions 
of Lemma 2.2. In addition, one can take Y = A*(X) with \\\\\<x — 7/m. 

The reason why this corollary holds is straightforward: Lemma 2.3 holds true for exponentially points and 
a sort of continuity argument allows to extend it to all points. Again it suffices to establish the property 
for unit-normed vectors. 

Proof Let Af e be an e-net for the unit sphere with cardinality obeying \M e \ < (1 + 2/e) n by Lemma 2 in 
[5]. 3 If Co is sufficiently large, Lemma 2.3 implies that with probability at least 1 — 0(e _7m (l + 2/e) n ) > 
1 - 0(e~ 7 ' m ), for all x G Af e , there exists Y = A*(X) obeying 

\\Y T x +1.7J T x|| < 0.1 (2.8a) 
II^ToIIf < 0.15 (2.8b) 

and || Alloc < 7/m (we wrote To in place of T for convenience). Note that this gives 

ll^ll < ||*t || + ||1t±|| < 0.15 + 1.8 < 2. 

Consider now an arbitrary unit-normed vector x and let Xq G J\f t be any element such that ||a?— a^olb <• e. 
Set A = xx T — xqXq , which obeys 

||A|| F < \\{x(x - x ) t )\\f + \\(x - x )xl \\ F = ||a3|| 2 ||£C - a; || 2 + ||as - a^o 1 1 2 1 1 a=o 1 1 2 < 2e. 

Suppose Y is as in (2.8) and let T be {X = yx T + xy T : y G M™}. We have 

Y T ± + l.7I T ± = (J - xx T )Y(I - xx T ) + 1.7(1 - xx T ) = Y T ± + 1.7J T ± - R, 

where 

R = AY(I - x xl) + (/ - x xl )YA - AY A + 1.7A. 

Since 

||A|| 2 < 2||y||||A||||J - x x% || + ||1"||||A|| 2 + 1.7||A|| < 11.4e + 8e 2 , 
we see that the first condition of Lemma 2.2 holds whenever e is small enough. For the first condition, 

Y T = xx T Y + (J — xx T )Yxx T = Y To + R, 

where 

R = AY (I - x Q xl) + (I - x Q xl)AY - AY A. 
Since A has rank at most 2, rank(-R) < 2, and 

\\R\\ F < y/2\\R\\ < y/2(2\\Y\\\\A\\\\I - x x%\\ + ||r||||A|| 2 ) < 8^2(6 + e 2 ). 

Choosing e sufficiently small concludes the proof of the corollary. ■ 

For any unit-normed vector x, there is xo G A/" E with [| a?o || 2 = 1 and \\x — xo\\2 < e, where e > 0. 
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2.3 Stability 



To see why the stability result (1.7) is optimal, suppose without loss of generality that 1 1 aso 1 1 2 = 1- Further, 
imagine that we are informed that < 5m for some known S. Since ||^4.(cco£Co) || l ~ m i h would not be 

possible to distinguish between solutions of the form (1 + A)a;o£Eo f° r max(0, 1 — <5)<l + A<l + 5. Hence, 
the error in the Frobenius norm may be as large as <5||ico £C oll^' = ^> which is what the theorem gives. 

We now turn to the proof of Theorem 1.3. We do not need to show the second part as the perturbation 
argument is exactly the same as in [3, Theorem 1.2]. The argument for the first part follows that of the 
earlier Lemma 2.2, and makes use of the existence of a dual certificate Y = A* (A) obeying the conditions 
of this lemma. 

Set X = x x* + H. Since X is feasible, H T± y and (If, 1") = (A(H),X). First, 
(H T ,Y T ) = (A(H),X) - (H T± ,Y T± ) 



> -IIAIIoopCJEOIIi + {H T x,I T ±) = ~\\A(H)\\i+tT{HT±). 



Second, we also have 



which by the same argument as before, yields 

tr( H Ti )>^||H T |k-( rT ^IWH)l| 1 . 
Since \(Ht,Yt)\ < 1\\Ht\\f, we have established that 4 



in 



Also, since H T ± is positive semidefinite 



\H T x\\ F <tt(H T x) < l\\H T \\ F + -WAiH)^ 

2 m 



so that 



11*11, < qMSI < 2Ci Hi, 

m m 
To see why the second inequality is true, observe that 

\\b - A{x x* + H)||i = \\w - A(H)\\i < \\b - ^(aj suo)lli = ll*»IU. 
which gives ||^4(ii")||i < 2||io||i by the triangle inequality. The proof is complete. 



Acknowledgements 

E. C. is partially supported by AFOSR under grant FA9550-09-1-0643 and by ONR under grant N00014-09-1-0258. 
This work was partially presented at the University of California at Berkeley in January 2012, and at the University 
of British Columbia in February 2012. 

4 The careful reader will note that we can get a far better constant by observing that the proof of Theorem 1.2 also yields 
\\Y t \\f < 1/4. Hence, we have \\H t \\f < 4(8/9 + 7)||-4(ff)||i/m. 
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